[AutoRound] Support WAN2.2 W4A16 quantization model by lvliang-intel · Pull Request #3353 · vllm-project/vllm-omni

lvliang-intel · 2026-05-05T13:06:02Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add AutoRound W4A16 quantization support for Wan2.2 pipelines and transformer modules.
https://huggingface.co/Intel/Wan2.2-TI2V-5B-Diffusers-int4-AutoRound
https://huggingface.co/Intel/Wan2.2-I2V-A14B-Diffusers-int4-AutoRound
https://huggingface.co/Intel/Wan2.2-T2V-A14B-Diffusers-int4-AutoRound

Related: #1325, #1777, #2670

Test Plan

Run UT
Run VBench dataset accuracy test

Test Result

Raw Scores

Subject Consistency	Wan2.2-I2V-A14B-Diffusers	Wan2.2-I2V-A14B-Diffusers-Int4-AutoRound	Wan2.2-T2V-A14B-Diffusers	Wan2.2-T2V-A14B-Diffusers-Int4-AutoRound
Subject Consistency	0.9752	0.9741	0.9508	0.9578
Background Consistency	0.9704	0.9691	0.9449	0.9465
Aesthetic Quality	0.6241	0.6089	0.5730	0.5980
Imaging Quality	0.6832	0.6679	0.6623	0.6591

Aggregate by Category

Category	Wan2.2-I2V-A14B-Diffusers	Wan2.2-I2V-A14B-Diffusers-Int4-AutoRound	Wan2.2-T2V-A14B-Diffusers	Wan2.2-T2V-A14B-Diffusers-Int4-AutoRound
Consistency	0.9728	0.9716	0.9478	0.9522
Quality	0.6537	0.6384	0.6176	0.6286

Evaluated Dimension Average

Model	Dimensions Evaluated	Avg Score
Wan2.2-I2V-A14B-Diffusers	4	0.8132
Wan2.2-I2V-A14B-Diffusers-Int4-AutoRound	4	0.8050
Wan2.2-T2V-A14B-Diffusers	4	0.7827
Wan2.2-T2V-A14B-Diffusers-Int4-AutoRound	4	0.7904

Generation Statistics

Model	Success Rate	Avg Latency(s)	Avg Memory(MB)	Speedup vs Ref	Memory Ratio vs Ref
Wan2.2-I2V-A14B-Diffusers	100.0	377.31	76309.33	1.00x	1.00x
Wan2.2-I2V-A14B-Diffusers-Int4-AutoRound	100.0	439.68	36893.0	0.86x	0.48x
Wan2.2-T2V-A14B-Diffusers	100.0	736.09	76298.33	1.00x	1.00x
Wan2.2-T2V-A14B-Diffusers-Int4-AutoRound	100.0	863.49	36891.33	0.85x	0.48x

The test is mainly for accuracy purpose. For video generation at batch size 1, Int4 W4A16 primarily saves memory (0.48x as shown — great for fitting larger models / longer videos in VRAM) but does not necessarily improve latency because the workload is compute-bound and dequantization overhead is significant.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2026-05-05T13:06:10Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-05-05T21:11:07Z

Comprehensive benchmarks and well-structured tests. Memory reduction to 0.48x is significant for VRAM-constrained deployments. Two notes: 1) Checklist items at the bottom are unchecked - confirm documentation was updated if required. 2) Latency impact (0.86x speedup) is expected for compute-bound workloads at batch size 1, but consider documenting guidance for optimal batch sizes where dequantization overhead is amortized.

david6666666 · 2026-05-18T06:50:06Z

Merge conflicts need fixing before review. Thx.

david6666666 · 2026-05-19T08:59:02Z

LGTM now

david6666666 · 2026-05-19T09:00:40Z

@yenuo26 please check test

david6666666 · 2026-05-20T04:32:11Z

CI passed

Running test: test_wan22_diffusion_features[cuda_ti2v_cache_dit] -- | Running test: test_wan22_diffusion_features[cuda_t2v_cfg_parallel] | Running test: test_wan22_diffusion_features[cuda_t2v_ulysses_sp] | Running test: test_wan22_diffusion_features[cuda_t2v_tp_vae_patch] | Running test: test_wan22_diffusion_features[cuda_t2v_hsdp] | Running test: test_wan22_diffusion_features[cuda_t2v_ring_atten] | Running test: test_wan22_diffusion_features[cuda_i2v_cfg_parallel] | Running test: test_wan22_diffusion_features[cuda_i2v_ulysses_sp] | Running test: test_wan22_diffusion_features[cuda_i2v_tp_vae_patch] | Running test: test_wan22_diffusion_features[cuda_i2v_hsdp] | Running test: test_wan22_diffusion_features[cuda_i2v_ring_atten] | Running test: test_wan22_diffusion_features[cuda_ti2v_cfg_parallel] | Running test: test_wan22_diffusion_features[cuda_ti2v_ulysses_sp] | Running test: test_wan22_diffusion_features[cuda_ti2v_tp_vae_patch] | Running test: test_wan22_diffusion_features[cuda_ti2v_hsdp] | Running test: test_wan22_diffusion_features[cuda_ti2v_ring_atten] | Running test: test_wan_2_1_vace[single_card_001] | Running test: test_wan_2_1_vace[parallel_001] | Running test: test_wan_2_1_vace[parallel_002] | Running test: test_wan_2_1_vace[parallel_003] | Running test: test_wan_2_1_vace[parallel_004] | Running test: test_wan_2_1_vace[parallel_005] | Running test: test_hunyuan_video_15_t2v[single_card_cachedit_layerwise] | Running test: test_hunyuan_video_15_t2v[parallel_cachedit_tp2_vae2] | Running test: test_wan22_i2v_autoround_w4a16_generates_video[omni_runner0] | Running test: test_wan22_t2v_autoround_w4a16_generates_video[omni_runner0] | Running test: test_wan22_i2v_autoround_w4a16_quant_peak[omni_runner0] | Running test: test_wan22_i2v_autoround_w4a16_baseline_peak[omni_runner0] | Running test: test_wan22_t2v_autoround_w4a16_quant_peak[omni_runner0] | Running test: test_wan22_t2v_autoround_w4a16_baseline_peak[omni_runner0] | Running Summary | ================================================== 34 passed, 2 deselected, 33 warnings in 4804.66s (1:20:04) ==================================================

david6666666 · 2026-05-20T06:31:57Z

please resolve conflicts, thx

david6666666 · 2026-05-21T06:12:20Z

please fix DCO

congw729 · 2026-05-21T06:32:50Z

Pls fix CI test: https://buildkite.com/vllm/vllm-omni/builds/10126/canvas?sid=019e4824-3105-4a82-a57b-367977644eb3&tab=output

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

…quantization support for Wan2.2 T2V / I2V inference on Ascend NPU (vllm-project#3578) Signed-off-by: hyh_hh <huyinghong1@huawei.com> Co-authored-by: hyh_hh <huyinghong1@huawei.com> Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Signed-off-by: lvliang-intel <liang1.lv@intel.com> Signed-off-by: hyh_hh <huyinghong1@huawei.com> Co-authored-by: hxhhhlalala <hyh_hh@163.com> Co-authored-by: hyh_hh <huyinghong1@huawei.com> Signed-off-by: Advik <scince5678@gmail.com>

Signed-off-by: lvliang-intel <liang1.lv@intel.com> Signed-off-by: hyh_hh <huyinghong1@huawei.com> Co-authored-by: hxhhhlalala <hyh_hh@163.com> Co-authored-by: hyh_hh <huyinghong1@huawei.com>

lvliang-intel requested a review from hsliuustc0106 as a code owner May 5, 2026 13:06

lvliang-intel force-pushed the feats/ar-w4a16-wan22 branch from 8ccefe8 to cbfbe97 Compare May 5, 2026 13:16

lvliang-intel force-pushed the feats/ar-w4a16-wan22 branch from cbfbe97 to 3a2cde7 Compare May 6, 2026 01:49

yiliu30 mentioned this pull request May 7, 2026

[RFC]: Intel Auto-Round x vLLM-Omni Quantization Support (2026 H1) #1325

Open

3 tasks

This was referenced May 8, 2026

[RFC]: Continuous Quantization Support #1854

Open

[RFC] [0.22.0]: Quantization Support JiusiServe/vllm-omni#182

Closed

Gaohan123 added this to the v0.22.0 milestone May 11, 2026

lvliang-intel force-pushed the feats/ar-w4a16-wan22 branch from 3a2cde7 to 7616e04 Compare May 19, 2026 03:28

lvliang-intel requested review from Gaohan123, Isotr0py, RuixiangMa, SamitHuang, ZJY0516, david6666666, princepride, wtomin and yenuo26 as code owners May 19, 2026 03:28

david6666666 added the ready label to trigger buildkite CI label May 19, 2026

yenuo26 reviewed May 19, 2026

View reviewed changes

lvliang-intel requested review from congw729 and lishunyang12 as code owners May 19, 2026 14:40

david6666666 added the diffusion-x2v-test label to trigger buildkite x2video series of diffusion models test in nightly CI label May 20, 2026

david6666666 removed the diffusion-x2v-test label to trigger buildkite x2video series of diffusion models test in nightly CI label May 20, 2026

david6666666 enabled auto-merge (squash) May 20, 2026 04:32

auto-merge was automatically disabled May 20, 2026 08:06
Head branch was pushed to by a user without write access

lvliang-intel requested review from ZeldaHuang, gcanlin, linyueqian, tzhouam, yuanheng-zhao and ywang96 as code owners May 20, 2026 08:06

lvliang-intel and others added 12 commits May 21, 2026 14:38

support autoround w4a16 for wan2.2

c2e3ae9

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

fix stage diffusion proc

de58ea3

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

fix i2v

d6ef8cc

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

snapshot sys.modules before iteration to prevent RuntimeError

6be4173

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

add test

24ecb6d

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

fix pre-commit

9b1737d

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

update doc

0d07190

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

remove unnecessary import

5bc949a

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

fix lint

c3c2d71

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

adapt test code according to comments

40dd76a

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

fix pre-commit

e261e3a

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

lvliang-intel force-pushed the feats/ar-w4a16-wan22 branch 2 times, most recently from e2c5156 to f75eaf3 Compare May 21, 2026 06:39

david6666666 approved these changes May 21, 2026

View reviewed changes

david6666666 merged commit 8297570 into vllm-project:main May 21, 2026
8 of 9 checks passed

lvliang-intel mentioned this pull request May 28, 2026

[vllm-omni]: Omni Quant Support intel/auto-round#1507

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoRound] Support WAN2.2 W4A16 quantization model#3353

[AutoRound] Support WAN2.2 W4A16 quantization model#3353
david6666666 merged 12 commits into
vllm-project:mainfrom
lvliang-intel:feats/ar-w4a16-wan22

lvliang-intel commented May 5, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 5, 2026

Uh oh!

hsliuustc0106 commented May 5, 2026

Uh oh!

david6666666 commented May 18, 2026

Uh oh!

david6666666 commented May 19, 2026

Uh oh!

david6666666 commented May 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david6666666 commented May 20, 2026 •

edited

Loading

Uh oh!

david6666666 commented May 20, 2026

Uh oh!

david6666666 commented May 21, 2026

Uh oh!

congw729 commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

lvliang-intel commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Raw Scores

Aggregate by Category

Evaluated Dimension Average

Generation Statistics

Uh oh!

chatgpt-codex-connector Bot commented May 5, 2026

Uh oh!

hsliuustc0106 commented May 5, 2026

Uh oh!

david6666666 commented May 18, 2026

Uh oh!

david6666666 commented May 19, 2026

Uh oh!

david6666666 commented May 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david6666666 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david6666666 commented May 20, 2026

Uh oh!

david6666666 commented May 21, 2026

Uh oh!

congw729 commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

lvliang-intel commented May 5, 2026 •

edited

Loading

david6666666 commented May 20, 2026 •

edited

Loading